JetBrains × Codex Hackathon: финалисты — IDE как «слой рассуждения» AI‑агента

Источник: https://blog.jetbrains.com/ai/2026/05/meet-the-finalists-jetbrains-codex-hackathon/

Краткое содержание

Пост подводит итоги первого совместного хакатона JetBrains и OpenAI Codex: примерно 40 заявок за выходные. Основная мысль — IDE с встроенной мощной кодовой моделью перестаёт быть «местом, где пишут код», и становится местом, где направляют агента, наблюдают за его рассуждением, контролируют его внимание и принимают решения о принятии его выхода. Шесть проектов‑финалистов:

«hyperreasoning» (1 место, Aditya Mangalampalli) заменяет один вызов модели на «search‑процесс»: система генерирует несколько подходов, обученный controller выбирает, какие развивать, какие отбрасывать и какие верифицировать тестами. Ошибки компилятора и упавшие тесты возвращаются в цикл. Tool window IDE рендерит поиск в реальном времени, и небольшая локальная модель в такой обвязке держится против frontier‑моделей при заметно меньшей стоимости.

«Scopecreep» (2 место) сворачивает hardware bring‑up в одно tool window IDE: схема, осциллограф, источник питания, terminal к устройству и таблица результатов перестают «прыгать» по экранам. Агент по схеме выбирает сигналы, снимает измерения и генерирует отчёт. Когда нужно физически переставить пробник, сессия паузится, и подсказка точно указывает место — инженер ставит пробник и нажимает Resume. Hybrid autonomy: автономно — где можно, human‑in‑the‑loop — где касается «железа».

«mesh-code» (3 место) даёт агентам общую память о текущем проекте: что испробовано, что решено, что в очереди. Сессия, начатая на одном ноутбуке, продолжается на другом любым доступным агентом, в том числе Codex.

«Latent Signal — Periscope» — плагин JetBrains на базе open‑source agentsview Уэса Маккинни. Показывает, что заполняет рабочую память агента ход за ходом, и рекомендует дальнейшее: продолжать, перематывать к лучшей точке ветвления, сжимать, делать fork, передавать сессию. Работает локально с большинством агентов, включая Codex.

«SecureLoop» превращает security‑инциденты в контролируемый цикл внутри JetBrains: агент собирает релевантный код, security‑правила и состояние зависимостей, запрашивает у Codex структурную диагностику и фикс, фикс прогоняется через автоматические проверки. PR создаётся автоматически, merge — нет. Всё, что повлияло на решение (диф, политика, тест), показывается в IDE для approve/reject. Идея команды: жить с файлом security-policy.md рядом с README, в котором проект описывает свои правила работы с секретами, ошибками, рискованными паттернами; coding agents читают его перед изменениями.

«Pinpoint» решает проблему расплывчатости фронтенд‑правок («подвинь этот элемент»): разработчик ставит пины прямо на живой странице, прикладывает комментарии и батчем отдаёт агенту с точным DOM/визуальным контекстом. Поставляется в двух форматах — браузерный для веб‑страниц и десктопный для произвольных интерфейсов.

Пример

<!-- security-policy.md рядом с README.md (идея SecureLoop) -->
# Project security policy

## Secrets
- Никогда не логировать значения переменных, чьи имена соответствуют /(?i)token|secret|key|password/
- Использовать только обёртку `secure_get(name)` из `internal/secrets.py`

## Errors
- Возвращать наружу только статус‑коды, без stack trace
- Полные трассы — только в `internal/observability/log.py`

## Risky patterns
- Запрещено `eval()`, `exec()`, `pickle.loads(...)` поверх внешнего ввода
- SSRF: запросы по URL только через `safe_http_client`

Значимость

Финалисты складываются в один сюжет: IDE становится «реасонинг‑слоем» агента, наблюдаемой средой управления, а не «бэкенд‑функцией автодополнения». Эта линия согласована с предыдущими апрельскими постами JetBrains («The IDE Is Already an AI Quality Variable», 30 апреля). Идея «security policy как файл в репо» совпадает с трендом «agent‑readable rules» (см. Anthropic Skills, GitHub Copilot rules, Cursor .cursorrules). Рынок IDE‑интегрированных агентных сред — одна из ключевых конкурентных линий 2026 года между JetBrains, Microsoft (VS Code + Copilot Workspace) и Cursor.

🧾 Транскрипт (формат)

Meet the Finalists: JetBrains x Codex Hackathon Source: https://blog.jetbrains.com/ai/2026/05/meet-the-finalists-jetbrains-codex-hackathon/

Put a capable coding model inside a developer’s primary workspace, and the IDE stops being a place where you write code. It becomes a place where you direct an agent, watch how it reasons, manage what it pays attention to, and decide when its output is worth shipping. That was the defining theme of the inaugural JetBrains x Codex Hackathon: across roughly 40 submissions over a single weekend, teams explored what it actually means to build with AI natively inside the IDE – not bolted on top of it. The six finalists came up with some of the most compelling answers.

🥇 First Place: hyperreasoning – Aditya Mangalampalli Most coding agents call the model once and hope for the best. As Aditya puts it: “LLMs spend a lot of time thinking in circles.” Hyperreasoning replaces the single shot with something closer to a search: the system drafts several possible approaches to a task, then a learned controller decides which to expand, which to cut, and which to verify against tests. Compiler errors and failing tests feed back into how the controller weighs its options.

Inside the IDE, a tool window renders the search live, so you can watch which paths the controller explored before settling on one. The argument the project makes is that a smaller local model wrapped in this kind of verified search loop can hold its own against much larger frontier models at meaningfully lower cost — with the IDE serving as the place where reasoning becomes visible and directable, rather than a black box that returns code.

🥈 Second Place: Scopecreep – Bhavik Sheoran, Kenneth Ross, Roman Javadyan, Joon Im Hardware bring-up is a tool-juggling exercise: schematic viewer in one window, vendor apps for the oscilloscope and power supply in others, a terminal talking to the device, a spreadsheet collecting results. Scopecreep collapses that into a single JetBrains tool window. Hand it a circuit schematic and an agent works through testing the board – picking signals worth measuring, capturing the readings, and producing a report.

The design choice worth noticing: when the agent decides a probe needs to be placed, the session pauses and shows the engineer exactly where to put it. The engineer places the probe physically and clicks Resume. It’s the right call for real instruments on a real bench – autonomous, where a computer can be trusted, human-in-the-loop, where the work touches the physical world.

🥉 Third Place: mesh-code – Ayush Ojha, Coco Cao, Kush Ise, AL DRAM Switch machines mid-task, and your coding agent starts over. mesh-code fixes that by giving agents shared memory of an in-progress project – what’s been tried, what’s been decided, what’s still pending – so a session that begins on one laptop can continue from another, with whichever agent happens to be available. Codex is one of the agents that can plug in.

Latent Signal – Periscope Long agent sessions accumulate dead weight: tool outputs nobody needs anymore, dead ends, context that was useful ten turns ago and isn’t now. Periscope, built on Wes McKinney’s open-source agentsview, is a JetBrains plugin that shows what’s actually filling up an agent’s working memory turn by turn – and recommends what to do about it, whether that’s continuing, rewinding to a better branching point, compacting, forking, or handing off entirely. It works with Codex and most other coding agents, and everything stays local.

SecureLoop – Abhiram Sribhashyam, Rahul Marri, Peyton Li Security incident response is still mostly copy-paste: stack trace into a chat window, repo context explained by hand, a fix written and committed in the hope it’s safe. SecureLoop turns that into a controlled loop inside JetBrains. When something breaks in production, the agent gathers the relevant code, the project’s security rules, and the state of its dependencies, then asks Codex for a structured diagnosis and a proposed fix. That fix runs through automated checks before any pull request opens.

The PR opens automatically. The merge does not. SecureLoop surfaces everything that informed the decision – the diff, the policy it bumped into, the test that proved the patch – inside the IDE for the developer to approve or reject. As the team put it: “Codex fully makes the PR ready for you, and it remains human-in-the-loop where you have to approve or deny.”

The team’s bigger thesis is a security-policy.md file that lives in the repo alongside README.md, spelling out a project’s specific rules for handling secrets, errors, and risky patterns. Coding agents read it before suggesting changes, so the question stops being “what’s a good fix?” and becomes “what’s an acceptable fix under this codebase’s rules?”

Pinpoint – Het Patel Frontend feedback delivered through a chat window is unavoidably vague. “Move that element” or “change that color” leaves the agent guessing which element you actually mean. Pinpoint takes that piece of the ambiguity off the table: developers drop pins directly on a live page, attach a comment to each, and send the whole batch to the agent with precise on-page context attached. The agent now knows exactly which element you meant – even if it still has to figure out what change you want.

The project ships in two pieces: one for annotating web pages in a browser, and a desktop companion for marking up anything visible on screen – useful when the interface in question isn’t a web page.

What the finalists show Looking across these six projects, a clear pattern emerges. Codex embedded in the IDE isn’t just a faster way to write code – it’s a reasoning layer you can watch think, a structured output engine you can direct, a participant in workflows that span hardware instruments, production alerts, shared session state, and context windows. And the IDE becomes the place where all of that comes together: visible, controllable, and version-controlled.

That’s the possibility these teams spent a weekend proving out, and it’s only the beginning.

View the full submission gallery.